Kraków
FeNeC: Enhancing Continual Learning via Feature Clustering with Neighbor- or Logit-Based Classification
Książek, Kamil, Jastrzębski, Hubert, Trojan, Bartosz, Pniaczek, Krzysztof, Karp, Michał, Tabor, Jacek
The ability of deep learning models to learn continuously is essential for adapting to new data categories and evolving data distributions. In recent years, approaches leveraging frozen feature extractors after an initial learning phase have been extensively studied. Many of these methods estimate per-class covariance matrices and prototypes based on backbone-derived feature representations. Within this paradigm, we introduce FeNeC (Feature Neighborhood Classifier) and FeNeC-Log, its variant based on the log-likelihood function. Our approach generalizes the existing concept by incorporating data clustering to capture greater intra-class variability. Utilizing the Mahalanobis distance, our models classify samples either through a nearest neighbor approach or trainable logit values assigned to consecutive classes. Our proposition may be reduced to the existing approaches in a special case while extending them with the ability of more flexible adaptation to data. We demonstrate that two FeNeC variants achieve competitive performance in scenarios where task identities are unknown and establish state-of-the-art results on several benchmarks.
Quantifying patterns of punctuation in modern Chinese prose
Dolina, Michał, Dec, Jakub, Drożdż, Stanisław, Kwapień, Jarosław, Liu, Jin, Stanisz, Tomasz
Recent research shows that punctuation patterns in texts exhibit universal features across languages. Analysis of Western classical literature reveals that the distribution of spaces between punctuation marks aligns with a discrete Weibull distribution, typically used in survival analysis. By extending this analysis to Chinese literature represented here by three notable contemporary works, it is shown that Zipf's law applies to Chinese texts similarly to Western texts, where punctuation patterns also improve adherence to the law. Additionally, the distance distribution between punctuation marks in Chinese texts follows the Weibull model, though larger spacing is less frequent than in English translations. Sentence-ending punctuation, representing sentence length, diverges more from this pattern, reflecting greater flexibility in sentence length. This variability supports the formation of complex, multifractal sentence structures, particularly evident in Gao Xingjian's "Soul Mountain". These findings demonstrate that both Chinese and Western texts share universal punctuation and word distribution patterns, underscoring their broad applicability across languages.
Hyperspectral image segmentation with a machine learning model trained using quantum annealer
Mazur, Dawid, Rybotycki, Tomasz, Gawron, Piotr
Since the energy consumption becomes a major problem in the development and implementation of artificial intelligence systems there exists a need to investigate the ways to reduce use of the resources by these systems. In this work we study how application of quantum annealers could lead to reduction of energy cost in training models aiming at pixel-level segmentation of hyperspec-tral images. Following the results of QBM4EO team, we propose a classical machine learning model, partially trained using quantum annealer, for hyperspectral image segmentation. We show that the model trained using quantum annealer is better or at least comparable with models trained using alternative algorithms, according to the preselected, common metrics. While direct energy use comparison does not make sense at the current stage of quantum computing technology development, we believe that our work proves that quantum annealing should be considered as a tool for training at least some machine learning models. Keywords: RBM, QML, Hyperspectral imaging, image segmentation 1 Introduction The rapid growth of artificial intelligence, especially in the field of generative models [18] and transformer architecture in 2017 [41] has lead to a major proliferation of large deep learning models. It is becoming a major concern that economic opportunities that are believed to be existing coming from the explosion of large models, lead to major energy consumption related to training and using these models. In order to mitigate this problem it is important to search for alternative methods of models training. In this work we employ an old idea and implement it on a new hardware device -- 1 arXiv:2503.01400v1
Molecular Fingerprints Are Strong Models for Peptide Function Prediction
Adamczyk, Jakub, Ludynia, Piotr, Czech, Wojciech
We study the effectiveness of molecular fingerprints for peptide property prediction and demonstrate that domain-specific feature extraction from molecular graphs can outperform complex and computationally expensive models such as GNNs, pretrained sequence-based transformers and multimodal ensembles, even without hyperparameter tuning. To this end, we perform a thorough evaluation on 126 datasets, achieving state-of-the-art results on LRGB and 5 other peptide function prediction benchmarks. We show that models based on count variants of ECFP, Topological Torsion, and RDKit molecular fingerprints and LightGBM as classification head are remarkably robust. The strong performance of molecular fingerprints, which are intrinsically very short-range feature encoders, challenges the presumed importance of long-range interactions in peptides. Our conclusion is that the use of molecular fingerprints for larger molecules, such as peptides, can be a computationally feasible, low-parameter, and versatile alternative to sophisticated deep learning models.
Constrained Hybrid Metaheuristic Algorithm for Probabilistic Neural Networks Learning
Kowalski, Piotr A., Kucharczyk, Szymon, Mańdziuk, Jacek
This study investigates the potential of hybrid metaheuristic algorithms to enhance the training of Probabilistic Neural Networks (PNNs) by leveraging the complementary strengths of multiple optimisation strategies. Traditional learning methods, such as gradient-based approaches, often struggle to optimise high-dimensional and uncertain environments, while single-method metaheuristics may fail to exploit the solution space fully. To address these challenges, we propose the constrained Hybrid Metaheuristic (cHM) algorithm, a novel approach that combines multiple population-based optimisation techniques into a unified framework. The proposed procedure operates in two phases: an initial probing phase evaluates multiple metaheuristics to identify the best-performing one based on the error rate, followed by a fitting phase where the selected metaheuristic refines the PNN to achieve optimal smoothing parameters. This iterative process ensures efficient exploration and convergence, enhancing the network's generalisation and classification accuracy. cHM integrates several popular metaheuristics, such as BAT, Simulated Annealing, Flower Pollination Algorithm, Bacterial Foraging Optimization, and Particle Swarm Optimisation as internal optimisers. To evaluate cHM performance, experiments were conducted on 16 datasets with varying characteristics, including binary and multiclass classification tasks, balanced and imbalanced class distributions, and diverse feature dimensions. The results demonstrate that cHM effectively combines the strengths of individual metaheuristics, leading to faster convergence and more robust learning. By optimising the smoothing parameters of PNNs, the proposed method enhances classification performance across diverse datasets, proving its application flexibility and efficiency.
Multifractal hopscotch in "Hopscotch" by Julio Cortazar
Dec, Jakub, Dolina, Michał, Drożdż, Stanisław, Kwapień, Jarosław, Stanisz, Tomasz
Punctuation is the main factor introducing correlations in natural language written texts and it crucially impacts their overall effectiveness, expressiveness, and readability. Punctuation marks at the end of sentences are of particular importance as their distribution can determine various complexity features of written natural language. Here, the sentence length variability (SLV) time series representing "Hopscotch" by Julio Cortazar are subjected to quantitative analysis with an attempt to identify their distribution type, long-memory effects, and potential multiscale patterns. The analyzed novel is an important and innovative piece of literature whose essential property is freedom of movement between its building blocks given to a reader by the author. The statistical consequences of this freedom are closely investigated in both the original, Spanish version of the novel, and its translations into English and Polish. Clear evidence of rich multifractality in the SLV dynamics, with a left-sided asymmetry, however, is observed in all three language versions as well as in the versions with differently ordered chapters.
Punctuation patterns in "Finnegans Wake" by James Joyce are largely translation-invariant
Bartnicki, Krzysztof, Drożdż, Stanisław, Kwapień, Jarosław, Stanisz, Tomasz
The complexity characteristics of texts written in natural languages are significantly related to the rules of punctuation. In particular, the distances between punctuation marks measured by the number of words quite universally follow the family of Weibull distributions known from survival analyses. However, the values of two parameters marking specific forms of these distributions distinguish specific languages. This is such a strong constraint that the punctuation distributions of texts translated from the original language into another adopt quantitative characteristics of the target language. All these changes take place within Weibull distributions such that the corresponding hazard functions are always increasing. Recent previous research shows that James Joyce's famous "Finnegans Wake" is subject to such extreme distribution from the Weibull family that the corresponding hazard function is clearly decreasing. At the same time, the distances of sentence ending punctuation marks, determining the variability of sentence length, have an almost perfect multifractal organization, so far to such an extent found nowhere else in the literature. In the present contribution based on several available translations (Dutch, French, German, Polish, Russian) of "Finnegans Wake", it is shown that the punctuation characteristics of this work remain largely translation invariant, contrary to the common cases. These observations may constitute further evidence that "Finnegans Wake" is a translinguistic work in this respect as well, in line with Joyce's original intention.
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects
Kutt, Krzysztof, Gomułka, Jakub, Miranda, Luiz do Valle, Nalepa, Grzegorz J.
In response to several cultural heritage initiatives at the Jagiellonian University, we have developed a new digitization workflow in collaboration with the Jagiellonian Library (JL). The solution is based on easy-to-access technological solutions -- Microsoft 365 cloud with MS Excel files as metadata acquisition interfaces, Office Script for validation, and MS Sharepoint for storage -- that allows metadata acquisition by domain experts (philologists, historians, philosophers, librarians, archivists, curators, etc.) regardless of their experience with information systems. The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections, so careful attention is paid to the high accuracy of metadata and proper links to external sources. The workflow has already been evaluated in two pilots in the DiHeLib project focused on digitizing the so-called "Berlin Collection" and in two workshops with international guests, which allowed for its refinement and confirmation of its correctness and usability for JL. As the proposed workflow does not interfere with existing systems or domain guidelines regarding digitization and basic metadata collection in a given institution (e.g., file type, image quality, use of Dublin Core/MARC-21), but extends them in order to enable rich metadata collection, not previously possible, we believe that it could be of interest to all GLAMs (galleries, libraries, archives, and museums).
Fully tensorial approach to hypercomplex neural networks
Niemczynowicz, Agnieszka, Kycia, Radosław Antoni
The fast progress in applications of Artificial Neural Networks (NN) promotes new directions of research and generalizations. This involves advanced mathematical concepts such as group theory [19], differential geometry [5, 6], or topological methods in data analysis [7]. The core of NN implementations lies in linear algebra usage.
I Read Everything Elon Musk Posted For a Week. Send Help.
Last January, not long after agreeing with an actual Nazi that western Jews have brought antisemitism upon themselves by welcoming "hordes of minorities" to their countries, Elon Musk took a quick trip to Poland. The billionaire chief of SpaceX, Tesla, and X laid a wreath at Auschwitz and then preceded on to a symposium in Krakow, where he told the conservative commentator Ben Shapiro that social media could have averted the Holocaust and bragged that he considered himself "aspirationally Jewish." The tweet, he explained in a different interview, at a different symposium "might be literally the worst and dumbest post I've ever done." But he did not take it down, nor has he moderated his views. If anything his descent into the online fever swamp has only accelerated.